智能论文笔记

End-to-end deep learning for directly estimating grape yield from ground-based imagery

Alexander G. Olenskyj , Brent S. Sams , Zhenghao Fei , Vishal Singh , Pranav V. Raja , Gail M. Bornhorst , J. Mason Earles

分类：计算机视觉

2022-08-04

产量估计是葡萄园管理中的强大工具，因为它允许种植者微调实践以优化产量和质量。但是，目前使用手动抽样进行估计，这是耗时和不精确的。这项研究表明，近端成像的应用与深度学习相结合，以进行葡萄园中的产量估计。使用车辆安装的传感套件进行连续数据收集，并使用商业收益率监控器在收获时结合了地面真实收益数据的收集，可以生成一个23,581个收益点和107,933张图像的大数据集。此外，这项研究是在机械管理的商业葡萄园中进行的，代表了一个充满挑战的图像分析环境，但在加利福尼亚中央山谷中的一组常见条件。测试了三个模型架构：对象检测，CNN回归和变压器模型。对象检测模型在手工标记的图像上进行了训练以定位葡萄束，并将束数量或像素区域求和以与葡萄产量相关。相反，回归模型端到端训练，以预测图像数据中的葡萄产量，而无需手动标记。结果表明，在代表性的保留数据集上，具有相当的绝对百分比误差为18％和18.5％的变压器和具有像素区域处理的对象检测模型。使用显着映射来证明CNN模型的注意力位于葡萄束的预测位置附近以及葡萄树冠的顶部。总体而言，该研究表明，近端成像和深度学习对于大规模预测葡萄群的适用性。此外，端到端建模方法能够与对象检测方法相当地执行，同时消除了手工标记的需求。

translated by 谷歌翻译

Personalized Detection of Cognitive Biases in Actions of Users from Their Logs: Anchoring and Recency Biases

Atanu R Sinha , Navita Goyal , Sunny Dhamnani , Tanay Asija , Raja K Dubey , M V Kaarthik Raja , Georgios Theocharous

分类：人工智能 | 机器学习

2022-06-30

认知偏见是人类在处理信息和环境中使用的精神捷径，这会导致偏见的行动和行为（或行动），对自己不知所措。偏见采取了多种形式，认知偏见占据了核心作用，造成公平，问责制，透明，道德，法律，医学和歧视。偏见的检测被认为是朝着缓解措施的必要步骤。在此，我们专注于两个认知偏见 - 锚定和新近度。计算机科学中认知偏见的识别在很大程度上是在信息检索的领域中，并且在注释数据的帮助下在总级别上确定了偏差。提出了不同的偏见检测方向，我们提供了一种原则性的方法，以及机器学习以从用户操作的Web日志中检测这两个认知偏见。我们的个人用户级别检测使其真正个性化，并且不依赖注释的数据。取而代之的是，我们从认知心理学中建立的两个基本原理开始，使用注意力网络的修改培训，并根据这些原则以新颖的方式解释注意力权重，以推断和区分这两种偏见。个性化方法允许对特定用户进行检测，这些用户在执行任务时容易受到这些偏见的影响，并且可以帮助他们之间建立意识以进行偏见缓解。

translated by 谷歌翻译

A workflow for segmenting soil and plant X-ray CT images with deep learning in Googles Colaboratory

Devin A. Rippner , Pranav Raja , J. Mason Earles , Alexander Buchko , Mina Momayyezi , Fiona Duong , Dilworth Parkinson , Elizabeth Forrestel , Ken Shackel , Jeffrey Neyhart

分类：计算机视觉

2022-03-18

X射线微型计算机断层扫描（X射线Microct）已使以微米尺度上的植物和土壤中发生的特性和过程表征。尽管这种高级技术广泛使用，但硬件和软件的主要限制都限制了图像处理和数据分析的速度和准确性。机器学习的最新进展，特别是将卷积神经网络应用于图像分析的应用，已实现了图像数据的快速而准确的分割。然而，在将卷积神经网络应用于环境和农业相关图像的分析中仍然存在挑战。具体而言，计算机科学家和工程师，构建这些AI/ML工具的工程师与农业研究中潜在的最终用户之间存在脱节，他们可能不确定如何在其工作中应用这些工具。此外，与传统的计算系统相比，培训和应用深度学习模型所需的计算资源是独特的，对计算机游戏系统或图形设计工作更为常见。为了应对这些挑战，我们开发了一个模块化工作流程，用于使用Googles Colaboragoration Web应用程序中的低成本资源，将卷积神经网络应用于X射线Microct图像。在这里，我们介绍了工作流的结果，说明了如何使用核桃叶，杏仁花芽和土壤骨料的示例扫描来优化参数以获得最佳结果。我们预计该框架将加速植物和土壤科学中新兴的深度学习技术的采用和使用。

translated by 谷歌翻译

Simultaneously Predicting Multiple Plant Traits from Multiple Sensors via Deformable CNN Regression

Pranav Raja , Alex Olenskyj , Hamid Kamangir , Mason Earles

分类：计算机视觉

2021-12-06

特征测量对于植物育种和农业生产管道至关重要。通常，使用费力的手动测量测量一套植物特征，然后用于培训和/或验证更高的吞吐量特征估计技术。这里，我们介绍了一种相对简单的卷积神经网络（CNN）模型，该模型接受多个传感器输入并预测多个连续特征输出 - 即多输入，多输出CNN（MIMO-CNN）。此外，我们将可变形的卷积层引入该网络架构（MIMO-DCNN），以使模型能够自适应地调整其接收领域，模拟数据中的复杂变量几何变换，以及微调连续的特征输出。我们检查MIMO-CNN和MIMO-DCNN模型如何在多输入（即RGB和深度图像）上执行，来自2021年自主温室挑战的多特征输出莴苣数据集。进行了消融研究以检查使用单一与多个输入的效果，以及单个与多个输出。 MIMO-DCNN模型导致归一化平均平方误差（NMSE）为0.068 - 顶部2021排行榜得分为0.081的实质性改进。提供了开源代码。

translated by 谷歌翻译

Visual Understanding of Complex Table Structures from Document Images

Sachin Raja , Ajoy Mondal , C V Jawahar

分类：计算机视觉 | 人工智能

2021-11-13

表结构识别对于全面了解文档是必要的。由于布局的高度多样化，内容的变化和空细胞的存在，非结构化业务文档中的表格很难解析。由于使用视觉或语言环境或两者既是识别单个小区的挑战，问题是特别困难的。准确地检测表格单元（包括空单元）简化了结构提取，因此，它成为我们工作的主要重点。我们提出了一种新的基于对象检测的深层模型，可以捕获表中单元格的固有对齐，并进行微调以快速优化。尽管对细胞准确地检测，但识别致密表的结构仍可能具有挑战性，因为在存在多行/列跨越单元的存在下捕获远程行/列依赖性的困难。因此，我们还旨在通过推导新的直线图的基础制剂来改善结构识别。从语义角度来看，我们突出了桌子中空细胞的重要性。要考虑这些细胞，我们建议对流行的评估标准提升。最后，我们介绍了一个适度大小的评估数据集，其引人注目的风格灵感来自人类认知，以鼓励对问题的新方法进行启发。我们的框架在基准数据集中通过2.7％的平均F1分数提高了先前的最先进的性能。

translated by 谷歌翻译

Semi-Structured Object Sequence Encoders

Rudra Murthy V , Riyaz Bhat , Chulaka Gunasekara , Hui Wan , Tejas Indulal Dhamecha , Danish Contractor , Marina Danilevsky

分类：计算机视觉 | 人工智能 | 自然语言处理

2023-01-03

In this paper we explore the task of modeling (semi) structured object sequences; in particular we focus our attention on the problem of developing a structure-aware input representation for such sequences. In such sequences, we assume that each structured object is represented by a set of key-value pairs which encode the attributes of the structured object. Given a universe of keys, a sequence of structured objects can then be viewed as an evolution of the values for each key, over time. We encode and construct a sequential representation using the values for a particular key (Temporal Value Modeling - TVM) and then self-attend over the set of key-conditioned value sequences to a create a representation of the structured object sequence (Key Aggregation - KA). We pre-train and fine-tune the two components independently and present an innovative training schedule that interleaves the training of both modules with shared attention heads. We find that this iterative two part-training results in better performance than a unified network with hierarchical encoding as well as over, other methods that use a {\em record-view} representation of the sequence \cite{de2021transformers4rec} or a simple {\em flattened} representation of the sequence. We conduct experiments using real-world data to demonstrate the advantage of interleaving TVM-KA on multiple tasks and detailed ablation studies motivating our modeling choices. We find that our approach performs better than flattening sequence objects and also allows us to operate on significantly larger sequences than existing methods.

translated by 谷歌翻译

Spectral Bandwidth Recovery of Optical Coherence Tomography Images using Deep Learning

Timothy T. Yu , Da Ma , Jayden Cole , Myeong Jin Ju , Mirza F. Beg , Marinko V. Sarunic

分类：人工智能 | 计算机视觉

2023-01-02

Optical coherence tomography (OCT) captures cross-sectional data and is used for the screening, monitoring, and treatment planning of retinal diseases. Technological developments to increase the speed of acquisition often results in systems with a narrower spectral bandwidth, and hence a lower axial resolution. Traditionally, image-processing-based techniques have been utilized to reconstruct subsampled OCT data and more recently, deep-learning-based methods have been explored. In this study, we simulate reduced axial scan (A-scan) resolution by Gaussian windowing in the spectral domain and investigate the use of a learning-based approach for image feature reconstruction. In anticipation of the reduced resolution that accompanies wide-field OCT systems, we build upon super-resolution techniques to explore methods to better aid clinicians in their decision-making to improve patient outcomes, by reconstructing lost features using a pixel-to-pixel approach with an altered super-resolution generative adversarial network (SRGAN) architecture.

translated by 谷歌翻译

Detection of Groups with Biased Representation in Ranking

Yuval Moskovitch , Jinyang Li , H. V. Jagadish

分类：机器学习

2022-12-30

Real-life tools for decision-making in many critical domains are based on ranking results. With the increasing awareness of algorithmic fairness, recent works have presented measures for fairness in ranking. Many of those definitions consider the representation of different ``protected groups'', in the top-$k$ ranked items, for any reasonable $k$. Given the protected groups, confirming algorithmic fairness is a simple task. However, the groups' definitions may be unknown in advance. In this paper, we study the problem of detecting groups with biased representation in the top-$k$ ranked items, eliminating the need to pre-define protected groups. The number of such groups possible can be exponential, making the problem hard. We propose efficient search algorithms for two different fairness measures: global representation bounds, and proportional representation. Then we propose a method to explain the bias in the representations of groups utilizing the notion of Shapley values. We conclude with an experimental study, showing the scalability of our approach and demonstrating the usefulness of the proposed algorithms.

translated by 谷歌翻译

A Fine-Grained Vehicle Detection (FGVD) Dataset for Unconstrained Roads

Prafful Kumar Khoba , Chirag Parikh , Rohit Saluja , Ravi Kiran Sarvadevabhatla , C. V. Jawahar

分类：计算机视觉

2022-12-30

The previous fine-grained datasets mainly focus on classification and are often captured in a controlled setup, with the camera focusing on the objects. We introduce the first Fine-Grained Vehicle Detection (FGVD) dataset in the wild, captured from a moving camera mounted on a car. It contains 5502 scene images with 210 unique fine-grained labels of multiple vehicle types organized in a three-level hierarchy. While previous classification datasets also include makes for different kinds of cars, the FGVD dataset introduces new class labels for categorizing two-wheelers, autorickshaws, and trucks. The FGVD dataset is challenging as it has vehicles in complex traffic scenarios with intra-class and inter-class variations in types, scale, pose, occlusion, and lighting conditions. The current object detectors like yolov5 and faster RCNN perform poorly on our dataset due to a lack of hierarchical modeling. Along with providing baseline results for existing object detectors on FGVD Dataset, we also present the results of a combination of an existing detector and the recent Hierarchical Residual Network (HRN) classifier for the FGVD task. Finally, we show that FGVD vehicle images are the most challenging to classify among the fine-grained datasets.

translated by 谷歌翻译

Heliophysics Discovery Tools for the 21st Century: Data Science and Machine Learning Structures and Recommendations for 2020-2050

R. M. McGranaghan , B. Thompson , E. Camporeale , J. Bortnik , M. Bobra , G. Lapenta , S. Wing , B. Poduval , S. Lotz , S. Murray

分类：人工智能 | 机器学习

2022-12-26

Three main points: 1. Data Science (DS) will be increasingly important to heliophysics; 2. Methods of heliophysics science discovery will continually evolve, requiring the use of learning technologies [e.g., machine learning (ML)] that are applied rigorously and that are capable of supporting discovery; and 3. To grow with the pace of data, technology, and workforce changes, heliophysics requires a new approach to the representation of knowledge.

translated by 谷歌翻译